Genome-Wide and Transcriptome-Wide Association Studies on Northern New England and Ohio Amyotrophic Lateral Sclerosis Cohorts

Background and Objectives Amyotrophic lateral sclerosis (ALS) is an age-associated, fatal neurodegenerative disorder causing progressive paralysis and respiratory failure. The genetic architecture of ALS is still largely unknown. Methods We performed a genome-wide association study (GWAS) and transcriptome-wide association study (TWAS) to understand genetic risk factors for ALS using a population-based case-control study of 435 ALS cases and 279 controls from Northern New England and Ohio. Single nucleotide polymorphism (SNP) genotyping was conducted using the Illumina NeuroChip array. Odds ratios were estimated using covariate-adjusted logistic regression. We also performed a genome-wide SNP-smoking interaction screening. TWAS analyses used PrediXcan to estimate associations between predicted gene expression levels across 15 tissues (13 brain tissues, skeletal muscle, and whole blood) and ALS risk. Results GWAS analyses identified the p.A382T missense variant (rs367543041, p = 3.95E-6) in the TARDBP gene, which has previously been reported in association with increased ALS risk and was found to share a close affinity with the Sardinian haplotype. Both GWAS and TWAS analyses suggested that ZNF235 is associated with decreased ALS risk. Discussion Our results support the need for future evaluation to clarify the role of these potential genetic risk factors for ALS and to understand genetic susceptibility to environmental risk factors.


Introduction
Amyotrophic lateral sclerosis (ALS) is an age-associated, fatal neurodegenerative disorder causing progressive paralysis and respiratory failure. 1,2With approximately 400,000 individuals worldwide estimated to be afflicted with ALS by 2040, 2 ALS is the third most prevalent neurodegenerative disorder after Alzheimer disease (AD) and Parkinson disease (PD). 3,4amily history of ALS is identifiable in approximately 10% of ALS cases, with the remaining 90% being of sporadic origin. 2,5In addition, a recent study found that ALS has an elevated prevalence in Northern New England compared with other regions in the United States. 6Therefore, it is important to understand genetic and environmental risk factors specific to this region.
Since 1993, researchers have made substantial progress in unraveling genetic mechanisms involved in ALS development and progression, leading to the identification of over 2 dozen genes associated with the disorder. 2,7Recent progress in genotyping and sequencing technologies has improved our understanding of ALS pathology. 3Multiple studies have revealed overlapping susceptibility variants of ALS with other neurodegenerative diseases, including frontotemporal dementia (FTD), AD, and PD. 1,5,8Exploring genetic risk factors for ALS can help uncover etiopathogenic mechanisms across the spectrum of neurodegenerative diseases and potentially unveil fundamental processes involved in neuronal degeneration. 1][10] Transcriptomewide association studies (TWAS) can clarify the association between genetically regulated gene expression and ALS risk and potentially identify novel genes related to ALS risk while reducing the multiple testing burden. 11Previous TWAS have successfully identified several ALS-associated genes expressed in various brain-related tissues and blood. 11,12spite multiple studies indicating that ALS has a moderately high heritability (40%-60%), 12,13 previously identified loci only account for a small proportion of the overall genetic predisposition to ALS. 3,5,7 Smoking is a known risk factor for ALS. 14It may interact with genetic factors to influence the risk of developing ALS.For instance, smoking has been shown to induce oxidative stress, which is associated with higher ALS risk. 14Therefore, smoking may interact with the ALS risk gene SOD1, which plays a critical role in regulating oxidative stress. 15There are very few previous studies examining genome-wide smoking-gene interactions associated with ALS risk and our research aims to bridge this gap.This study, seeking more insight into ALS's genetic architecture, integrates GWAS and TWAS methods to detect the genetic risk factors and assesses gene-smoking interactions using sporadic ALS cases and controls based in Northern New England and Ohio that have been collected in part from previous studies.

Study Population
The enrollment procedure for ALS cases and controls is outlined by Andrew et al. 16,17 In summary, we recruited ALS cases and controls from Northern New England and Ohio, with their signed consent to provide blood or saliva samples, demographic and clinical information, and complete the environmental questionnaire.Cases were newly diagnosed patients with ALS from medical centers in these regions.Controls consist of both population controls and clinic controls.Population controls were recruited randomly by mail using the US Postal Service Delivery Sequence File (USPS DSF 2 ).Clinic controls were patients diagnosed with non-neurodegenerative diseases.All participants were at least aged 18 years.
Between 2020 and 2023, the Laboratory of Neurogenetics at the National Institute on Aging genotyped DNA on 435 ALS cases and 279 controls from Northern New England and Ohio participants.Genotyping was conducted using the Illumina NeuroChip according to the manufacturer's instructions, a platform designed to target curated variants in neurologic diseases. 18We measured genotypes for 487,374 single nucleotide polymorphisms (SNPs) from the arrays prior to quality control filtering, including 305,670 SNPs from a GWAS backbone and 179,467 custom SNPs selected throughout the genome.

Quality Control and Genotype Imputation
We used PLINK 19,20 software to perform standard quality control procedures for genotype data, and we implemented the following steps outlined by Chia et al. 21Briefly, we excluded samples with over 5% missing genotypes based on the sample call rate and removed samples with heterozygosity values beyond a threshold (F > 0.15 or F < −0.15).We removed non-European individuals from the principal component analysis because of the low numbers in the New Hampshire population, using the HapMap 3 Genome Reference Panel 22 as the reference for ancestral information.Given that most instances of ALS are sporadic, we excluded all familial ALS cases; however, we cannot entirely exclude the rare occurrence of a monogenic gene variant because when this study was performed, sporadic cases were not undergoing clinical genetic testing.In addition, we removed variants (1) containing over 5% missing genotypes, (2) with less than 5% minor allele frequency, (3) showing deviation from Hardy-Weinberg equilibrium (p < 1.0E-3), and (4) with a p-value below 1E-4 in the case/control nonrandom missingness test.
After quality control, 613 individuals, including 378 sporadic ALS cases and 235 controls, were included in analyses, and 242,090 SNPs were available for imputation.We conducted genotype imputation by Michigan Imputation Server 23 in GRCh37/hg19, using the European population data of the 1000 Genomes Project 24,25 (phase 3, version 5, available at reference 26) as the imputation reference.Only SNPs with an Glossary AD = Alzheimer disease; ALS = amyotrophic lateral sclerosis; BMP = bone morphogenetic protein; FDR = false discovery rate; FTD = frontotemporal dementia; GWAS = genome-wide association study; PD = Parkinson disease; SNP = single nucleotide polymorphism; TWAS = transcriptome-wide association study.imputation accuracy R 2 ≥ 0.3 were included in analyses.The quantile-quantile (QQ) plots show no genomic inflation after quality control (eFigure 1).

Genome-Wide Association Analysis
For the GWAS analysis, we performed covariate-adjusted logistic regression using PLINK, adjusting for sex, age at symptom onset, and the first 10 principal components of genetic ancestry.The Manhattan plot was generated using the "CMplot" 27 package in R version 4.0.2.We validated the significant SNP identified in previous GWAS results using publicly available data with a larger sample size, conducting logistic regression and adjusting for the same covariates.For this validation analysis, genotype data were obtained from 10,067 ALS cases and 2,251 controls from the database of Genotypes and Phenotypes (dbGaP) 28 with the study accession number phs000101.v5.p1.To balance the casecontrol ratio, we included an additional 11,887 controls from 2 other dbGaP data sets (phs000187 and phs000428).After quality control, 22,419 individuals and 335,021 variants were available for imputation in the validation study.The genotype imputation was also conducted on this validation data using the Michigan Imputation Server.A threshold pvalue of 5E-8 was set for genome-wide significance after Bonferroni correction for multiple testing in the GWAS.
In addition to examining the main effect of SNPs, we also evaluated SNP-smoking interactions associated with ALS susceptibility.We performed interaction analysis by including cigarette smoking status (ever-smoker vs never-smoker) and a multiplicative SNP-smoking interaction term adjusting in covariate-adjusted models.Participants without smoking status were removed from this analysis.Interaction analyses could not be pursued using the validation data set because smoking status was unavailable.

Transcriptome-Wide Association Analysis
We employed the widely used TWAS approach PrediXcan 29 to predict the expression levels of participants from the Northern New England and Ohio ALS cohort.PrediXcan trains predictive models using reference data sets consisting of transcriptome and genotype information.Prediction weights were obtained from PredictDB, which derived these weights through the elastic net method using the Genotype-Tissue Expression version 7 as the reference panel. 29,30We examined associations between ALS risk and predicted gene expression levels across 15 tissues related to ALS, which included 13 brain and spinal cord regions (amygdala, anterior portion, caudate, cerebellar hemisphere, cerebellum, nucleus accumbens, cortex, frontal cortex BA9, hippocampus, hypothalamus, putamen, substantia nigra, C1 spinal cord), skeletal muscle, and whole blood tissues.We standardized predicted gene expression levels and tested associations using logistic regression with adjustment for sex, age at symptom onset, and the first 3 principal components of genetic ancestry.The false discovery rate (FDR) of 0.30 was used as the threshold for suggestively significant gene expression levels.
Standard Protocol Approvals, Registrations, and Patient Consents All participants involved were consented.All study procedures have been approved by the Committee for the Protection of Human Subjects at Dartmouth Health.

Data Availability
We are in the process of uploading the genotype data used in this study to dbGaP with the accession number phs000101.Once processed, the data will be available through application on dbGaP.

Genome-Wide Association Study Between ALS Risk and SNPs
A total of 378 ALS cases and 235 controls of European ancestry passed the quality control.These participants' demographics information was provided in eTable 1.We calculated 14,125,267 association statistics for imputed genotypes.None of the SNPs passed the genome-wide significance threshold of 5E-8, but there were 150 SNPs with p-values less than 1E-5 (eTable 2).These 150 SNPs were localized to 12 cytogenetic locations (Figure 1).
The variants of suggestive statistical significance were all located within 15 genes.Table 1 lists the most statistically significant variant for each of the 15 genes.2][33] This SNP was not available among imputed variants in the validation data set from dbGaP.Instead, we tested 2 nearby variants upstream and downstream, rs3835416 and rs148414479, as proxies.The association pvalues for the 2 variants were 0.008 and 0.202, respectively.

Association Study Between ALS Risk and SNP-Smoking Interactions
For SNP-smoking interaction analysis, 276 ALS cases and 230 controls had available data on smoking status.We found 19 SNPs from 7 cytogenetic locations with evidence of interaction with smoking based on the P < 1E-5 threshold (Figure 2).None of the p values for interaction reached P < 5E-8.Among these 19 SNPs, rs3815479 and rs201995562 are intronic variants located at the GDF3 gene (growth differentiation factor 3) and MYO5B gene (myosin VB), as displayed in Table 2.The rest of the SNPs were not located within any genes.The entire list of the 19 SNPs with evidence of interaction with smoking is provided in eTable 3.

Transcriptome-Wide Association Study of Tissue-Specific Predicted Gene Expression Levels and ALS Risk
We identified 8 genes from 5 tissues suggestively associated with ALS risk with FDR-adjusted P < 0.30, as displayed in Table 3. Higher predicted expression levels of ZNF235 showed marginal significant associations with lower ALS risk in the brain caudate tissue (p = 6.76E-5,FDR = 0.14) and skeletal muscle tissue (p = 3.84E-5, FDR = 0.29).SNPs within ZNF235 were also found to be associated with ALS risk in our GWAS analysis.The Miami plots, including both GWAS and  TWAS results in brain caudate and skeletal muscle within the 1 Mb region of ZNF235, can be found in eFigure 2.

Discussion
In this study, we conducted GWAS and TWAS on ALS cases and controls from Northern New England and Ohio to explore the underlying genetic architecture of ALS.Our GWAS identified 15 genes associated with ALS risk, characterized by SNPs with suggestive significance (P < 1E-5), while TWAS identified 8 gene expression levels across 5 tissues suggestively associated with ALS risk (FDR <0.3).
For GWAS findings, we identified a suggestively significant variant, rs367543041, located within the ALS-associated gene TARDBP.2][33] The TARDBP variation, particularly the p.A382T missense variant (rs367543041), has been linked to approximately 30% of ALS cases within the genetically conserved Sardinian population. 32,33Although we could not directly replicate this variant in the larger dbGaP data set because this SNP was not included in the imputed genotype, we observed an association with a nearby proxy SNP, supporting the involvement of TARDBP variation in ALS susceptibility.The GWAS also identified KIF1B, a member of the kinesin family, associated with ALS risk. 34While there is no direct evidence linking KIF1B variants to ALS, 35 a study observed a differential regulation of KIF1B in sciatic nerve cells and the spinal cord, suggesting its potential significance in ALS. 36In addition, another member of the kinesin family, KIF5A, has been identified to be associated with ALS in previous studies. 10,37Another gene identified as suggestively significant in the GWAS is MACROD2, a mono-ADP ribosylhydrolase that responds to DNA damage by nuclear export to the cytoplasm. 38In previous studies, MAC-ROD2 has been reported as a neurodevelopmental-related gene, 39,40 recognized as a susceptibility gene for autism spectrum disorders and schizophrenia. 39,41 additionally explored SNP-smoking interactions, identifying variants in GDF3 and MYO5B as suggestively interacting with cigarette smoking to influence ALS risk.GDF3 is a member of growth differentiation factors, which constitutes a subfamily of the transforming growth factor-β (TGF-β)  superfamily. 42,43GDF3 has been identified as an inhibitor of bone morphogenetic proteins (BMPs). 44BMPs play a key role in inducing the formation of cartilage, bone, and skeletal muscle. 45,46The participation of GDF3 in the development of bone and cartilage, acting as an inhibitor of BMPs, may provide insights into its potential association with ALS.MYO5B is a member of the class V myosins participating in intracellular transport. 47,48Another member of the class V myosins, MYO5C, showed an association with late-onset AD based on its gene expression level. 49Despite SNP-smoking interactions being identified for GDF3 and MYO5B, our validation cohort's lack of smoking data precluded replicating these findings.In future analyses, we plan to leverage geographic information system technology and pollution databases from government agencies to explore more potential geneenvironment interactions within our cohort, including lead, mercury, pesticides, and air pollution.
For TWAS findings, our results indicate ZNF235 as a potential ALS risk gene.ZNF235 encodes a zinc finger protein that acts as a transcriptional repressor, potentially participating in neuronal differentiation. 50,51Both GWAS and TWAS analyses suggest associations between ZNF235 and ALS, with predicted ZNF235 expression levels in the caudate and skeletal muscle tissues showing reduced expression levels among ALS cases.Despite limited knowledge regarding its role in ALS, our results suggest ZNF235 as a candidate gene warranting further functional investigation.Another TWASidentified gene, CEP43 (also known as FGFR1OP), is a fusion partner for FGFR1. 52A previous study indicates that FGFR1 can mediate motor neuron apoptosis in ALS. 53r study had several limitations.The modest sample size likely constrained our power to detect genome-wide significant associations.As a result, our study failed to identify significant associations with other known ALS genetic risk factors, such as SOD1, NUP50, and ERBB4, possibly because of differences in population structure or sample size.To mitigate this limitation, we sought to supplement our cohort with a larger dbGaP data set, but this introduced challenges with genotyping platform differences and unavailable smoking data.Future studies with expanded sample sizes and harmonized genotyping arrays are needed.This study was restricted to individuals of European ancestry.Future research should also include more populations to validate or extend our findings across different ancestral groups and geographical regions.In addition, the significant variant in TARDBP was not directly validated in the dbGaP database, so we can only conclude it as a tentative association.Despite these constraints, our integrated GWAS and TWAS of ALS provided useful insights into genetic susceptibility.Moreover, TWAS enabled us to explore the gene expression levels across 15 potential ALS-related tissues, including 13 brain-related tissues, skeletal muscle tissue, and whole blood tissue, to understand better the underlying genetic mechanisms of ALS in different tissues.
In summary, this study identified variants and genes associated with ALS risk through GWAS and TWAS analyses.We validated the TARDBP association and identified ZNF235 as a potential novel ALS risk gene.Our findings also reinforce the likely complex interplay of genetic and environmental factors in ALS etiology.Follow-up genetic research is important to uncover how identified variants and genes influence motor neuron degeneration in ALS.

Acknowledgment
This research was supported by the Centers for Disease Control (CDC) (R01TS000288) and the NIH Intramural Research Programs, National Institute on Aging (Z01-AG000933).This research used the high-performance computational resources of the Biowulf Linux cluster located at the NIH, Bethesda, MD, USA (biowulf.nih.gov).The authors

Figure 1
Figure 1 Manhattan Plot for SNPs in Northern New England and Ohio ALS Cohort

Figure 2
Figure 2 Manhattan Plot for SNP-Smoking Interactions in Northern New England and Ohio ALS Cohort

Table 1
Fifteen Genes With Suggestive Significant SNPs in Northern New England and Ohio ALS Cohort a This table lists the most significant SNP of each gene, and the other SNPs can be found in eTable 2. b Positions are encoded in GRCh37/hg19.c Major allele/minor allele (effect allele).d Minor allele frequency (MAF).e p-Values of SNPs were calculated from logistic regression adjusting for sex, age at symptom onset, and the first 10 principal components.Neurology: Genetics | Volume 10, Number 5 | October 2024 Neurology.org/NGe200188(4)

Table 2
Two Genes With Suggestive Smoking-Associated SNPs in Northern New England and Ohio ALS Cohort a Positions are encoded in GRCh37/hg19.b Major allele/minor allele (effect allele).c Minor allele frequency (MAF).d p-Values of SNP-smoking interaction were calculated from logistic regression adjusting for sex, age at symptom onset, smoking main effect, and the first 10 principal components.

Table 3
Eight Gene Expressions Identified by TWAS to be Suggestively Associated With ALS Risk With FDR <0.3The CEP43 gene is also known as FGFR1OP.expressour gratitude to the personnel at the Laboratory of Neurogenetics (NIH) for their support.The data sets used for the analyses detailed in this manuscript were acquired from dbGaP through accession numbers phs000101.v5.p1, phs000187, and phs000428 (ncbi.nlm.nih.gov/sites/entrez?db=gap).Data collection and application development for the dbGaP data phs000187 were supported by grants 3P50CA093459, 5P50CA097007, 5R01ES011740, and 5R01CA133996.Received by Neurology: Genetics February 22, 2024.Accepted in final form July 23, 2024.Submitted and externally peer reviewed.The handling editor was Associate Editor Raymond P. Roos, MD, FAAN. b